August 30, 2004 PCI Express Introduction In my editorial previewing DAC I identified the SIGGRAPH International Conference on Computer Graphics and Interactive Techniques as a comparable event in so far as it is a high tech show targeted at professionals (digital content creators and graphics professionals) and has a parallel conference with high quality tutorials, panels and technical papers. This year the show was held in Los Angeles from August 8th to 12th drawing 27,825 professionals including yours truly. SIGGRAPH (Special Interest Group for Graphics) is part of the Association for Computing Machinery (ACM). Decades ago the SIGGRAPH show was dominated by CAD companies and by vendors of plotters, monitors and workstations. A lot has changed over the years. This year's show highlights the graphics-related products, tools, and technologies (current and future) used to create feature films, television programs, commercials, music and corporate videos, game production, web design and interactive web streaming. Hardware exhibitors included firms with equipment for motion capture, scanning, video effects, digital video, graphics boards and processors. Software exhibitors included firms with offerings for 3D modeling, animation, video production, visualization, rendering, streaming, video encoding and compression and visual effects. We are all familiar with the stunning visual effects and imagery that are commonplace in movies, television and on the web. Because of the nature of the show there was an electronic theater, an animation theater, a cyberfashion show and an art gallery, all showing a blend of art and technology. Walking around the exhibit floor was like watching movie trailers in a movie theater. Impressive demonstrations were given of the how these effects are produced by the likes of Discreet (3dsmax), Alias (Maya), Pixar (Renderman) and Avid/SoftImage (XSI). Clearly these software packages require significant graphic and computational horsepower. The computer graphics industry has come a long, long way since the days of Tron (1982) and the Norelco shaver commercial based upon technology from MAGI (Mathematics Application Group Inc.) Leading vendors of graphics chips and graphics accelerator boards including NVIDAI, ATI and 3D Labs were also exhibiting. All three made significant product announcements during SIGGRAPH. NVIDIA NVIDIA Corporation was founded in 1993. The company designs, develops and markets graphics processing units (GPUs), media and communications processors (MCPs), ultra-low power media processors (UMPs), and related software. These products have been incorporated into a wide variety of computing platforms, including consumer PCs, enterprise PCs, notebook PCs, professional workstations, handhelds, and video game consoles. NVIDIA is headquartered in Santa Clara, California and employs more than 2,000 people worldwide. In 2003 NVIDIA had revenues of $1.82 billion: NA $405, AP $1.26 and Europe $106 million with 4 customers accounting for 60% of revenue. On August 9th NVIDIA introduced its Quadro FX 4400 model, part of a distinctive new family of professional graphics products and based on the PCI Express bus architecture. The new products include: FX 4400 the performance leader (135 million triangles/sec, 6.4 billion texels per sec fill rate) featuring 512MB of G-DDR3 frame buffer memory, a 256-bit memory interface, 35.2GB/sec of memory bandwidth, 3-pin stereo support, and dual DVI display connectors; FX 4400G delivering genlock and framelock capabilities; FX 1400 the popular price-performance model; and FX 540 with a high definition component output support for video previewing and recording. By using an innovative PCI Express high-speed interconnect (HSI), a complex piece of networking technology that performs seamless, bi-directional interconnect protocol conversion at incredible speed lines, NVIDIA can transform its award-winning GeForce FX series into a full-family of PCI Express GPUs. Using this approach allows the firm to manufacture one GPU with support for two interfaces: PCIe and AGP. In the NV3x family, AGP is supported natively. The HSI bridge will provide quick access to PCI Express operability. For the NV4x family with PCI Express support, the same HSI bridge can be reversed for an AGP variant of the board. The NVIDIA Quadro models support SLI (Scalable Link Interface) a new technology that enables two Quadro FX graphics boards to operate in a single workstation. SLI is based on an intelligent communication protocol embedded in the GPU and a high-speed digital interface on the graphics board to facilitate data flow. An extensive suite of software provides dynamic load balancing, and advanced rendering and compositing to ensure smooth frame rates and image quality. NVIDIA and notebook manufacturers have co-designed MXM (Mobile PCI Express Module) interface to provide a consistent interface for mobile PCI Express graphics. The MXM initiative supports a wide range of graphics solutions from any GPU manufacturer. ATI Technologies Inc. Founded in 1985, ATI Technologies Inc. is a leader in the supply of graphics, video and multimedia solutions for desktop personal computers, mobile computing, DTV, cell phones, handhelds, consoles and workstation products. ATI Technologies comprises three core business units: Desktop, integrated and mobile and consumer. ATI employees 2,200 people and in 2003 had revenues of $1.38 billion: Canada $20, US $285, Europe $113, and AP $993 million; Components $962, Boards $397, and Other $25 million. On June 1st ATI announced a complete line-up of FireGL workstation graphics accelerators with full native support for PCI Express bus architecture. The newly branded Visualization series will offer four new graphics accelerators; the high-end FireGL V7100, mid-range FireGL V5100, and redefining the entry-level segment, the FireGL V3200 and the FireGL V3100. With 256 MB of memory, and a visual processing unit (VPU) architected with a 16 pixel pipelines and six vertex processors, the FireGL V7100 doubles the rendering power of ATI's previous high end FireGL products. The FireGL V7100 supports dual DVI configurations, as well as dual link, the ability to support ultra high end resolution 9 megapixel displays. On August 13th ATI announced that it had shipped its one millionth native PCI Express visual processing unit (VPU), speeding the industry transition to PCI Express. 3Dlabs was acquired by Singapore multinational Creative Technology Ltd. in early 2002 for $104 million and now operates as a wholly owned subsidiary. Creative was founded in Singapore in July 1, 1981. Product segments include audio (Sound Blaster), speakers, personal digital entertainment communication and graphics. For 2003 Creative had revenue of $702 million. Sales of graphics products are roughly 10% of revenue. On June 15th 3DLABS announced the PCI Express-based Wildcat Realizm 800. The Wildcat Realizm 800 features a unique Wildcat Realizm Vertex/Scalability Unit (VSU) and dual Wildcat Realizm Visual Processing Units (VPU) to deliver over 700 GFLOPS of floating-point graphics processing. These work together to enable a software-compatible family of graphics accelerators ranging from a single VPU AGP 8x solution to a unique dual-VPU configuration, which takes full advantage of the enhanced bandwidth of PCI Express. The Wildcat Realizm VSU receives graphics commands at full bandwidth from a 16-lane PCI Express interface and processes vertices with 67 billion floating point operations per second in a powerful SIMD array of highly optimized vector processors. The VSU is then able to drive two VPUs at full bandwidth over a 8.4GB/sec interface while optimally distributing graphical primitives between the two VPUs to achieve a genuine doubling of both geometry and fill-rate performance. The Wildcat Realizm 800 is slated for availability in the third calendar quarter of this year at an MSRP of US$2,799. OpenGL Graphic Processing Units are programmable. The industry standard for this purpose is OpenGL. The OpenGL API (Application Programming Interface) began as an initiative by Silicon Graphics Inc. (SGI) to create a single, vendor-independent API for the development of 2D and 3D graphics applications. The specification was largely based on earlier work on the SGI IRIS GL library. SGI produced a sample implementation that hardware vendors could use to develop OpenGL drivers for their hardware. The sample implementation has been released under an open source license. Modifications to the OpenGL API are made through the OpenGL Architecture Review Board. The OpenGL Architecture Review Board (ARB), an independent consortium formed in 1992, governs the OpenGL specification. Composed of many of the industry's leading graphics vendors, the ARB defines conformance tests and approves new OpenGL features and extensions. As of October 2003, voting members of the ARB include 3Dlabs, Apple, ATI, Dell Computer, Evans & Sutherland, Hewlett-Packard, IBM, Intel, Matrox, NVIDIA, SGI, Sun. Many OpenGL extensions have been defined by vendors and groups of vendors. The OpenGL Utility Library (GLU) provides many modeling features, such as quadric surfaces and NURBS curves and surfaces. GLU is a standard part of every OpenGL implementation. Also, there is a higher-level, object-oriented toolkit, Open Inventor, which is built atop OpenGL, and is available separately for many implementations of OpenGL. The figure below gives an overview of the traditional graphics pipeline. Geometry (vertices, lines, polygons) and pixel data (pixels, images, bitmaps) take different routes. OpenGL Pipeline Architecture All geometric primitives are described by vertices. Even parametric curves and surfaces can be mathematically defined by a net of control points. Evaluators are used to calculate vertex properties such as surface normal, texture coordinates, colors, and spatial coordinate values. Per vertex operations include transformations (scaling, translation, rotation) and projections from 3D to 2D. Advanced operations for lighting and texture may also be applied. Primitives (lines, polygons, bit maps) are then assembled and clipped. Rasterization is the conversion of both geometric and pixel data into fragments. Each fragment square corresponds to a pixel in the framebuffer. Line and polygon stipples, line width, point size, shading model, and coverage calculations to support antialiasing are taken into consideration as vertices are connected into lines or the interior pixels are calculated for a filled polygon. Color and depth values are assigned for each fragment square. Blending, dithering, logical operation, and masking by a bitmask may also be performed. Graphics Application (Digital Content Creation, CAD/CAM, ..) vendors build sophisticated software on top of OpenGL with interactive GUIs to define, edit and display models, animations and videos. Complex operations can be invoked by drag and drop techniques. PCI Express Intro All of the graphics products described above have in common that they are PCI Express based. Jim Pappas, Director of Initiative Marketing for Intel's Enterprise Platform Group, says "The graphics industry is expected to make a rapid transition to PCI Express taking advantage of the technology's increased performance characteristics". According to Jen-Hsun Huang, president and CEO at NVIDIA "The PCI Express transition is going to be an exciting time for the PC industry, stated. By aligning ourselves closely with Intel and helping define this new specification, we were able to engineer an innovative protocol engine, in HSI, that delivers the full-PCI Express feature set without any compromises. HSI and PCI Express will enable a new level of performance for high bandwidth applications like graphics and networking." " Since the outset of the PCI Express initiative, our aim was to deliver a top-to-bottom family of PCI Express graphics cards to our OEM customers, hastening the PCI Express transition," said Rick Bergman, Senior Vice President Marketing and General Manager, Desktop, ATI Technologies. "PCI-E is a major innovation in the computer architecture and there is a rapid transition in the market to this bus standard" Note that while ATI and NVIDIA agree on the importance of PCI Express, the two companies are initially supporting PCI Express in very different ways: ATI will provide PCI Express compatibility with a new line of GPUs that offer native PCI-E support, while NVIDIA's first PCI Express efforts will use a High-Speed Interconnect (HSI) bridge chip to graft AGP GPUs to the PCI-E interface. This enables them to maintain parallel AGP and PCI-E GPU lines. The two firms are arguing in public over the strengths and weakness of the bridge approach. Before describing PCI Express, we should review PCI that it is replacing. PCI Formed in 1992, PCI-SIG (originally formed as the Peripheral Component Interconnect Special Interest Group) is the industry organization chartered with the development and management of the PCI bus specification, the industry standard for a high-performance I/O interconnect to transfer data between a CPU and its peripherals. The PCI-SIG currently has more than 800 member companies. The PCI (Peripheral Component Interconnect) bus structure introduced in 1992 has been the mainstay for over a decade. The original 33-MHz, 32-bit implementation delivers a peak theoretical bandwidth of 133 megabytes per second. Later generation of backwards compatible PCI bus specifications emerged to improve performance including a more recent 64-bit, 66MHz combination with a bandwidth of 512MB/s. PCI-X 1.0 with a maximum clock speed of 133 MHz was developed to increase the bus speed, reduce latency and improved protocols by doubling the bus width from 32 bits to 64 bits. PCI-X 2.0 specification extends the bus frequency to 266 MHz and 533MHz and adds advanced features like ECC. Also introduced was the Accelerated Graphic Port (AGP) specification, which defined a dedicated high-speed PCI bus for graphics operations. The AGP bus offloaded graphics traffic from the PCI system bus and freed up bandwidth for other communications and I/O operations. The initial version of AGP was a 32-bit bus running at 66 MHz with a peak data transfer rate of 266 MB/s. AGP has evolved to AGP2X, AGP4X, and finally today's AGP8X, which operates at 2.134 gigabytes per second (GB/sec). In addition, Intel recently added dedicated USB 2.0 and Serial ATA links to the Southbridge in its chip sets, further reducing the I/O demands on the PCI bus. Comparison of Bus Architecture Performance The PCI architecture is shown in the diagram below. PCI Architecture Note the Host Bridge is often referred to as the Northbridge, while the I/O Bridge is referred to as the Southbridge. The Northbridge connects to fastest devise, namely the CPU, memory and graphics. The Southbridge bridge routes traffic from the different I/O devices on the system: the hard drives, USB ports, Ethernet ports, etc. to the Northbridge and onto to the CPU and/or memory. Because the PCI is not fast enough for some devices, the trends has been to attach interfaces (SATA, USB) directly to the Southbridge. Thus we now have collection of specialized buses of different protocols and bandwidth capabilities. The demands of emerging computing and communications platforms exceed the capabilities of the traditional 32 bit, 33 MHz PCI bus. Technical innovations such as 10 GHz+ CPU speeds, faster memory, higher-speed graphics, gigabit networking, 1394b, and other applications will drive the need for much greater internal system bandwidth. For example, both 1394b and Gigabit Ethernet require bandwidth that exceeds PCI's current shared 133MB/sec maximum bandwidth. The general consensus is that PCI and AGP have reached their limits while the demand for increased performance and bandwidth only increase. The PCI bus cannot be easily scaled up in frequency or down in voltage. In addition, the PCI bus does not support features such as advanced power management, native hot plugging/hot swapping of peripherals, or Quality of Service (QoS) to guarantee bandwidth for real-time operations. Finally, all of the available bandwidth of the PCI bus is limited to one direction (send or receive) at a time. PCI Express (PCEi) PCI-SIG (the Peripheral Component Interconnect Special Interest Group), defines PCI Express as "...an open specification designed from the start to address the wide range of current and future system interconnect requirements of multiple market segments in the computing and communications industries. The PCI Express Architecture defines a flexible, scalable, high-speed, serial, point-to-point, hot pluggable/hot swappable interconnect that is software-compatible with PCI." PCI Express (formerly 3GIO) is a new I/O technology that is compatible with the current PCI software environment. PCI Express defines a packetized protocol and load/store architecture. Its layered architecture enables attachment to copper, optical, or emerging physical signaling media. PCI Express uses an embedded clocking scheme to enable better frequency scaling and provides many advanced features as well as innovative form factors. It can be used for chip-to-chip and add-in card applications to provide connectivity for adapter cards, as a graphics I/O attach point for increased graphics bandwidth, as well as an attach point to other interconnects like 1394b, USB 2.0, InfiniBand Architecture and Ethernet. Multiple point-to-point connections introduce a new element, the switch, into the I/O system topology. The switch replaces the multi-drop bus and is used to provide fan-out for the I/O bus. A switch may provide peer-to-peer communication between different endpoints and this traffic, if it does not involve cache-coherent memory transfers, need not be forwarded to the host bridge. The PCI Express architecture defines a high-performance, point-to-point, scalable, serial bus. A PCI Express link consists of dual simplex channels, each implemented as a transmit pair and a receive pair for simultaneous transmission in each direction. Each pair consists of two low-voltage, differentially driven pairs of signals. A data clock is embedded in each pair, using an 8b/10b clock-encoding scheme to achieve very high data rates. The initial frequency is 2.5Gb/s/direction and is expected to increase with silicon technology to 10Gb/s/direction (the practical maximum for signals in copper) PCI Express Physical Layer The bandwidth of a PCI Express link may be linearly scaled by adding signal pairs to form multiple lanes. The physical layer supports x1, x2, x4, x8, x12, x16 and x32 lane widths and splits the byte data. Each byte is transmitted, with 8b/10b encoding, across the lane(s). This data disassembly and re-assembly is transparent to other layers. PCI Express provides I/O attach points for high-performance graphics, 1394b, USB 2.0, InfiniBand Architecture, Gigabit networking and so on. PCI Express will be available in a number of different I/O expansion formats, depending on the platform - notebook, desktop, or server. Servers, which require larger bandwidths to service I/O requirements, will have more PCI Express slots, and these slots will provide higher PCI Express lane counts. In contrast, a notebook may use the PCI Express architecture internally, but only expose a single x1 lane for medium speed peripherals. The PCI Express architecture is a high-speed, general-purpose serial I/O interconnect that provides the bandwidth required for current and future applications. It has already caused ripple effects as evidenced by the actions of complementary standards organization. ASI SIG The Advanced Switching Interconnect Special Interest Group (ASI SIG) is a nonprofit collaborative trade organization chartered with providing a switched fabric interconnect standard for the communications and compute industries. Advanced Switching is a standards-based switched-interconnect and data-fabric architecture based on PCI Express technology for connecting system boards and components in future products. Advanced Switching uses the same physical and link layers as the PCI Express architecture to achieve widespread interoperability and availability of technology. Together, PCI Express and Advanced Switching technologies ensure broadly available building blocks and tools that enable component and equipment makers to reuse technology across multiple products, reduce design costs and shorten the time it takes to get products to market. Express Card PCMCIA (Personal Computer Memory Card International Association) is an international standards body and trade association with over 200 member companies that was founded in 1989 to establish standards for Integrated Circuit cards and to promote interchangeability among computer systems. In September 2003 PCMIA introduced ExpressCard (code name NEWCARD) as a new standard for hot swappable system modules which it believes will replace 'CardBus' as the preferred solution for end user add-ins. Based on PCI Express architecture and Universal Serial Bus (USB) 2.0 interfaces, ExpressCard directly connects to chipsets removing the need for a bridge component. It supports dual-direction, single lane PCI Express, which translates to a peak data rate of 250MB/sec in comparison to the 132-MB/sec PC Card standard. There are two standard formats of ExpressCard modules: the ExpressCard/34 module which is 34 mm wide and the ExpressCard/54 module at 54 mm width. Both modules are 75mm long and 5mm high and both have 26 pins compared to the 68 pin card bus controllers that it would replace. Both also put out less than 1.3 watts of dissipation. By combining USB 2.0 and PCI Express interfaces in a single form factor, it becomes easier to expand a machine in a variety of ways without opening the box to gain access to slots. EDA Vendors The graphic chips described above are the types of semiconductors that push the envelope of EDA toolsets. The customers of EDA vendors will have to deal more directly with PCI Express. The list below contains links to vendor announcements this summer related to PCI Express. Synopsys' DesignWare IP Core for PCI Express First to Pass PCI-SIG Compliance Tests Cadence Incisive Palladium System Cuts NVIDIA's Verification Time in Half; Palladium Accelerator/Emulator Speeds Verification of NVIDIA's Newest Graphics Processor Synopsys' New PCI Express PHY IP Enables Lower Cost ICs Cadence and Rambus Sign Agreements to Deliver Portfolio of High-Speed Serial Link Solutions Xyratex Adopts Mentor Graphics PCI Express Intellectual Property for Advanced Switching Industry Standard Rambus and Mentor Graphics Collaborate to Offer Interoperable PCI Express Solutions; Proven PCI Express-Compliant Solutions Now Available to Chip Designers Agere Systems Introduces Advanced PCI Express(R) and Gigabit Ethernet Interface Solutions Weekly Highlights NEC Electronics America and Synplicity to Co-Host Seminar on Structured ASICs and Amplify ISSP Software Mentor Graphics FastScan ATPG Tool Selected for UMC's 130 and 90 Nanometer Reference Flow International Engineering Consortium's Euro DesignCon 2004 to Feature Vast Array of Technical Papers Precision RTL Synthesis Tool From Mentor Graphics Delivers Excellent QoR for Designs Using Actel's ProASIC Plus Devices Apache's Physical Power Integrity Flow Adopted by ATI Technologies Synopsys' DesignWare IP Core for PCI Express First to Pass PCI-SIG Compliance Tests Mentor Graphics Adds Serial ATA IP with Acquisition of Palmchip Intellectual Property Business Third Annual Asia Cadence Technology Symposium - ACTS Scheduled August 24 Through September 3 Amkor Completes Acquisition of Unitive Gartner Says Worldwide Semiconductor Revenue On Pace for 27 Percent Growth in 2004 ARM and Artisan Combine to Deliver System-on-Chip IP Solutions